CS 374 : Algorithms in Biology ( Fall 2006 ) Marc A . Schaub Paper reference
نویسندگان
چکیده
Population genetics models are useful for obtaining background expectations about genetic variation. In this work, the authors calibrate such a model using empirical data sets, specifically Single Nucleotide Polymorphism (SNP) data from three populations (West Africa, East Asia and Europe). The authors use a total of 15 measures to compare the simulated data to the empirical observations. For each measure they compute the root-mean-square error (RMSE) of the simulated value with respect to the mean empirical value. The overall goodness of fit of a particular model is obtained by calculating the total RMS discrepancy of all measures. The calibration of the model starts from a standard neutral model that includes the separation between the African and the non-African population, and the subdivision of the later into the European and East Asia populations. This model has a root-mean-square-error (RMSE) of 4.7. The model is refined using a stepwise approach: a set of parameters is added to the model and those parameters are optimized in order to minimize the error. This step is then repeated with additional parameters. In the first of these steps, the RMSE for the single-locus measures is reduced to 1.15 by increasing the fraction of low-frequency alleles and adding population bottlenecks and migration between populations to the model. The recombination rate is then optimized to match the observed heterozygosity. The recombination model is improved in order to obtain a larger, and thus more realistic, level of linkage disequilibrium (LD) than with the neutral model. This is done by adding large-scale variations as well as fine-scale variations such as localized hotspots to the model. The best-fitting model has an overall RMSE of 1.35 with respect to the mean empirical values. The model is evaluated by generating predictions for the X chromosome of the same population, which was not used during the calibration. The calibrated model performs significantly better than the neutral model (RMSE of 0.97 instead of 1.51). The calibrated model is also shown to be able to simulate haplotype blocks significantly better than the neutral model. Finally, the authors show that the variations found in a set of 100 genes are reproduced by the calibrated model an can thus be explained without having to hypothesize positive selections.
منابع مشابه
Computational Learning Theory Lecture Notes for CS Spring Semester
1 Preface This manuscript is a compliation of lecture notes from the graduate level course CS 582, \Computational Learning Theory," I taught a t W ashington University in the spring of 1991. Students taking the course were assumed to have b a c kground in the design and analysis of algorithms as well as good mathematical background. Given that there is no text available on this subject, the cou...
متن کاملCuckoo search via Levy flights applied to uncapacitated facility location problem
Facility location problem (FLP) is a mathematical way to optimally locate facilities within a set of candidates to satisfy the requirements of a given set of clients. This study addressed the uncapacitated FLP as it assures that the capacity of every selected facility is finite. Thus, even if the demand is not known, which often is the case, in reality, organizations may still be able to take s...
متن کاملDistributed and Cooperative Compressive Sensing Recovery Algorithm for Wireless Sensor Networks with Bi-directional Incremental Topology
Recently, the problem of compressive sensing (CS) has attracted lots of attention in the area of signal processing. So, much of the research in this field is being carried out in this issue. One of the applications where CS could be used is wireless sensor networks (WSNs). The structure of WSNs consists of many low power wireless sensors. This requires that any improved algorithm for this appli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006